### DECoNT Model Weights Reproduction

------------------------------------------------------------------------

Please, make sure you followed the package handling instructions given in the main README file at DECoNT\ Result\ Reproduction directory. 

------------------------------------------------------------------------


This folder, Weight Reproduction, includes 4 subfolders one for each WES-based CNV caller. 
In order to reproduce the weights, one should start a training.
To start a training follow the steps below:

1) cd into the directory named with the desired model weights to be reproduced. (e.g. cd into /Weight\ Reproduction/XHMM if you want to reproduce the weights for XHMM model.)

2) In this directory there are 2 different folders and a training script. Folders are named: outputs and train_data and script is named train_DECoNT_XXXX.py. The directory "outputs" is used for writing the output files after training process is done. The output files include data resulting from test split (i.e. one can perform different tests after training because test split includes random shuffling of the data.) and the trained model weights with .h5 extension. All of the testing procedure can be repeated with newly trained model weights instead of the ones presented in the main DECoNT Result Reproduction directory.

3) The training_data is currently empty, however inside the training_data directory there is a data.txt file with links to publicly available 1000Genomes data that can be processed with custom scripts for training. data.txt file also includes another link to already processed and ready-to-use data for training. After downloading the data files provided in this link to the training_data directory, delete data.txt file and cd into the mother directory. Running the train_DECoNT_XXXX.py script will start the training process. By default the training starts on the CPU on your computer. However, we highly recommend using multiple GPU's for training. Even when the training is performed on parallel using the following GPUs:  3 NVIDIA GeForce RTX 2080 Ti (11GB, 352Bit) and 1 NVIDIA TITAN RTX GPUs (24GB, 384Bit), training process takes: 70, 12, 95 and 50 hours for XHMM, CoNIFER, CODEX2 and Control-FREEC respectively. Without such a GPU support and using a CPU, one should expect training durations ~10 times longer than the ones afero-mentioned.

4) After the training process, outputs directory contains the reproduced model weights in .h5 format with new test split data in .npy format.